Multivariate information measures: an experimentalist's perspective 
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Information theory has long been used to quantify interactions between two variables. With 
the rise of complex systems research, multivariate information measures are increasingly needed. 
Although the bivariate information measures developed by Shannon are commonly agreed upon, 
the multivariate information measures in use today have been developed by many different groups, 
and differ in subtle, yet significant ways. Here, we will review the information theory behind 
each measure, as well as examine the differences between these measures by applying them to 
several simple model systems. In addition to these systems, we will illustrate the usefulness of the 
information measures by analyzing neural spiking data from a dissociated culture through early 
stages of its development. We hope that this work will aid other researchers as they seek the best 
multivariate information measure for their specific research goals and system. Finally, we have made 
software available online which allows the user to calculate all of the information measures discussed 
within this paper. 

PACS numbers: 89.70. Cf, 89.75.Fb, 87.19.1o, 87.19.lv 



I. INTRODUCTION 

Information theory has proved to be a useful tool in 
many disciplines. It has been successfully applied in sev- 
eral areas of research, including neuroscience [lL data 
compression coding 0], dynamical systems [j, and 
genetic coding [5j, just to name a few. Information the- 
ory's broad applicability is due in part to the fact that it 
relies only on the probability distribution associated with 
one or more variables. Generally speaking, information 
theory uses the probability distributions associated with 
the values of the variables to ascertain whether or not 
the values of the variables are related and, depending on 
the situation, the way in which they are related. As a 
result of this, information theory can be applied to linear 
and non-linear systems, although this does not guarantee 
that an information-based measure will capture all non- 
linear contributions. In summary, information theory is 
a model-independent approach. 

Information theoretic approaches to problems involv- 
ing one and two variables are well understood and widely 
used. In addition to the one and two variable measures, 
several information measures have been introduced to an- 
alyze the relationships or interactions between three or 
more variables |6l-ll3j. These multivariate information 
measures have been applied in physical systems 114. 151. 
biological systems jl6l [l7j . and neuroscience [18M20[. 
However, these multivariate information measures differ 
in significant and sometimes subtle ways. Furthermore, 
the notation and naming associated with these measures 
is inconsistent throughout the literature (see, for exam- 
ple, 0,111 IHril). Within this paper, we will examine 
a wide array of multivariate information measures in an 
attempt to clearly articulate the different measures and 
their uses. After reviewing the information theory behind 
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each individual measure, we will apply the information 
measures to several model systems in order to illuminate 
their differences and similarities. Also, we will apply the 
information measures to neural spiking data from a disso- 
ciated neural culture. Our goal is to clarify these methods 
for other researchers as they search for the multivariate 
information measure that will best address their specific 
research goals. 

In order to facilitate the use of the information mea- 
sures discussed in this paper, we have made our MAT- 
LAB software freely available, which can be used to calcu- 
late all of the information measures discussed herein [24| . 
An earlier version of this work was previously posted on 
the arXiv (H- 



II. SYNERGY AND REDUNDANCY 

A crucial topic related to multivariate information 
measures is the distinction between synergy and redun- 
dancy. With regard to these information measures, the 
precise meanings of "synergy" and "redundancy" have 
not been established, though they have been invoked by 
many researchers in this field (see, for instance, fl3l . [lllh 
For a recent treatment of synergy in this context, see [261 ] . 

To begin to understand synergy, we can use a simple 
system. Suppose two variables (call them X\ and X 2 ) 
provide some information about a third variable (call it 
Y). In other words, if you know the state of X\ and X 2 , 
then you know something about the state Y. Loosely, 
the portion of that information that is not provided by 
knowing both X\ alone and X 2 alone is said to be pro- 
vided synergistically by X\ and X 2 . The synergy is the 
bonus information received by knowing X\ and X 2 to- 
gether, instead of separately. 

We can take a similar initial approach to redundancy. 
Again, suppose X\ and X 2 provide some information 
about Y . The common portion of the information Xi 
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provides alone and the information X 2 provides alone is 
said to be provided redundantly by X\ and X 2 . The re- 
dundancy is the information received from both X\ and 
X 2 . 

These imprecise definitions may seem clear enough, 
but in attempting to measure these quantities, re- 
searchers have created distinct measures that produce 
different results. Based on the fact that the overall goal 
has not been clearly defined, it cannot be said that one of 
these measures is "correct." Rather, each measure has its 
own uses and limitations. Using the simple systems be- 
low, we will attempt to clearly articulate the differences 
between the multivariate information measures. 



III. MULTIVARIATE INFORMATION 
MEASURES 

In this section we will discuss the various multivari- 
ate information theoretic measures that have been intro- 
duced previously. Of special note is the fact that the 
names and notation used in the literature have not been 
consistent. We will attempt to clarify the discussion as 
much as possible by listing alternative names when ap- 
propriate. We will refer to an information measure by its 
original name (or at least, its original name to the best 
of our knowledge). 



A. Entropy and mutual information 

The information theoretic quantities involving one and 
two variables are well-defined and their results are well- 
understood. Regarding the probability distribution of 
one variable (call it p(x)), the canonical measure is the 
entropy H(x) [13|. The entropy is given by [28| : 



#(*) = - J>(a01og(p(z)) 



(1) 



The entropy quantifies the amount of uncertainty that is 
present in the probability distribution. If the probability 
distribution is concentrated near one value, the entropy 
will be low. If the probability distribution is uniform, the 
entropy will be at a maximum. 

When examining the relationship between two vari- 
ables, the mutual information (I) quantifies the amount 
of information provided about one of the variables by 
knowing the value of the other [27j . The mutual infor- 
mation is given by: 

I(X;Y) = H(X) - H(X\Y) = H(Y) - H(Y\X) = 

H(X) + H(Y)-H(X,Y) (2) 
where the conditional entropy is given by: 
H{X\Y) = Y,P{y)H{X\y) 



^2p(v) ^2 p(x\y) log 



1 



p(x\y) 



(3) 



The mutual information can also be written as the 
Kullback-Lciblcr divergence between the joint probabil- 
ity distribution of the actual data and the joint probabil- 
ity distribution of the independent model (wherein the 
joint distribution is equal to the product of the marginal 
distributions). This form is given by: 



I(X-Y)= Y, P(^) lo § 

x£X,yeY 



p(x,y) \ 
P(x)p(y)j 



(4) 



The mutual information can be used as a measure 
of the interactions among more than two variables by 
grouping the variables into sets and treating each set as 
a single vector-valued variable. For instance, the mu- 
tual information can be calculated between Y and the 
set S = {Xi, X 2 }^j^ in the following way: 



I(Y;S) 



E 

y€Y 
1 exj ,#2 €x 2 



( p{y,xi,x 2 ) 

\p{y)p(xi,x 2 ) 



(5) 

However, when the mutual information is considered as 
in Eq. j5J), it is not possible to separate contributions 
from individual X variables in the set S. Still, by vary- 
ing the number of variables in S, the mutual information 
in Eq. ([5]) can be used to measure the gain or loss in in- 
formation about Y by those variables in S. Along these 
lines, Bcttcncourt ct al. used the mutual information 
between one variable (in their case, the activity of a neu- 
ron) and many other variables considered together (in 
their case, the activities of a group of other neurons) in 
order to examine the relationship between the amount 
of information the group of neurons provided about the 
single neuron to the number of neurons considered in the 
group 

The mutual information can be conditioned upon a 
third variable to yield the conditional mutual information 
[27j |. It is given by: 



I(X-Y\Z) = 

E p( x >y\ z ) l °s 

zeZ x£X,y£Y 



p(x,y\z) 



p(x\z)p(y\z) 



2J P( x : V' Z ) l0 S 



xeX,y£Y,zeZ 



p(z)p(x,y,z) 
p(x,z)p(y,z) 



(6) 



The conditional mutual information quantifies the 
amount of information one variable provides about a sec- 
ond variable when a third variable is known. 



B. Interaction information 

The first attempt to quantify the relationship among 
three variables in a joint probability distribution was 
the interaction information (II), which was introduced 
by McGill @. It attempts to extend the concept of the 
mutual information as the information gained about one 
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variable by knowing the other. The interaction informa- 
tion is given by: 

II(X; Y; Z) = I(X, Y\Z) - I(X; Y) = 

I(X, Z\Y) - I(X; Z) = I(Z, Y\X) - I(Z, Y) (7) 

Of the interaction information, McGill said Q, "We 
see that II(X; Y; Z) is the gain (or loss) in sample in- 
formation transmitted between any two of the variables, 
due to the additional knowledge of the third variable." 
The interaction information can also be written as: 

II(X- Y; Z) = I(X, Y; Z) - (I(X; Z) + I(Y; Z)) (8) 

In the form given in Eq. ([5]), the interaction infor- 
mation has been widely used in the literature and has 
often been referred to as the synergy [l|| 0, HH, HH and 
redundancy-synergy index (9|. Some authors have used 
the term "synergy" because they have interpreted a pos- 
itive interaction information result to imply a synergistic 
interaction among the variables and a negative interac- 
tion information result to imply a redundant interaction 
among the variables. Thus, if we assume this interpre- 
tation of the interaction information and that the inter- 
action information correctly measures multivariate inter- 
actions, then synergy and redundancy are taken to be 
mutually exclusive qualities of the interactions between 
variables. This view will find a counterpoint in the par- 
tial information decomposition to be discussed below. 

The interaction information can also be written as an 
expansion of the entropies and joint entropies of the vari- 
ables: 

II(X; Y; Z) = -H(X) - H(Y) - H(Z) + 

H(X, Y) + H(X, Z) + H(Y, Z) - H(X, Y, Z) (9) 

This form leads to an expansion for the interaction 



information for N number of variables [32J. If S = 
{Xi, X2, ■ ■ ■ Xn}, then the interaction information be- 
comes: 



II(S) 



TCS 



(-1)' 



\H(T) 



(10) 



In Eq. ([!())) , T is a subset of S and |5| denotes the set 
size of S. 

A measure similar to the interaction information 
was introduced b y B ell and is referred to as the co- 
information (CI) [33[. It is given by the following ex- 
pansion: 



CI(S) 



J2(-l) m H(T) = (-l)^II(S) (11) 



TCS 



Clearly, the co-information is equal to the interaction 
information when S contains an even number of variables 
and is equal to the negative of the interaction information 
when S contains an odd number of variables. So, for the 
three variable case, the co-information becomes: 



CI(X; Y; Z) = I{X; Y) - I(X, Y\Z) = 

I{X;Z)+I{Y;Z)-I(X,Y;Z) 



Because the co-information is directly related to the 
interaction information for systems with any number of 
variables, we will forgo presenting results from the co- 
information. The co-information has also been referred 
to as the generalized mutual information (is! ]. 



C. Total correlation 

The interaction information finds its conceptual base 
in extending the idea of the mutual information as the 
information gained about a variable when the other vari- 
able is known. Alternatively, we could extend the idea 
of the mutual information as the Kullback-Lciblcr diver- 
gence between the joint distribution and the independent 
model. If we do this, we arrive at the total correlation 
(TC) introduced by Watanabe Q ■ It is given by: 



TC(S) = J2 P{x) log 



xCS 



p(x) 



p(xi)p(x 2 ) ■ ■ .p(x n ) 



(13) 



In Eq. (|13p . x is a vector containing individual states of 
the X variables. The total correlation can also be written 
in terms of entropies as: 



TC{S) 



H(S) 



(14) 



KXi£S 



In this form, the total correlation has been referred to as 
the multi-information [ll| , the spatial stochastic interac- 
tion HH, and the integration [tH H^. Using Eq. $Z§, 
the total correlation can also be written using a series 
of mutual information terms (see Appendix [A] for more 
details): 



TC(S)=I(X 1 ;X 2 ) + I(X u X a ;X 3 ) 
I(X\, . . . , l„_i; X n ) 

D. Dual total correlation 



(15) 



(12) 



After the total correlation was introduced, a measure 
with a similar structure, called the dual total correlation 
(DTC), was introduced by Han [1, The dual total 
correlation is given by: 

DTC(S) = I J2 H(S/X t ) )-(„- l)H(S) (16) 

In Eq. ([T6|) . S/Xi is the set S with Xi removed and 
n is the number of X variables in S. The dual total 
correlation can also be written as [3|| : 

DTC(S) = H(S) - £ H(Xi\S/Xi) (17) 

X,CS 

The dual total correlation calculates the amount of en- 
tropy present in 5* beyond the sum of the entropies for 
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each variable conditioned upon all other variables. The 
dual total correlation has also been referred to as the ex- 
cess entropy [36| and the binding information (HJ . Using 
Eq. @, (fT4"| . and (fTCj) . the dual total correlation can 
also be related to the total correlation by (see Appendix 
[B]for more details): 



DTC(S) = I WXilXi)) -TC(S) 



(18) 



E. AI 

A distinct information measure, called AI, was intro- 
duced by Nirenberg and Latham [ljl H3] ■ It was intro- 
duced to measure the importance of correlations in neural 
coding. For the purposes of this paper, we can apply AI 
to the following situation: consider some set of X vari- 
ables (call this set S). The values of the variables in S 
are related in some way to the value of another variable 
(call it Y). In Nirenberg and Latham's original work, 
the X variables are signals from neurons and the Y vari- 
able is the value of some stimulus variable. AI compares 
the true probability distributions associated with these 
variables to one that assumes the X variables act inde- 
pendently (i.e., there are no correlations between the X 
variables beyond those that can be explained by Y). If 
these distributions are similar, then it can be assumed 
that there are no relevant correlations between the X 
variables. If, on the other hand, these distributions are 
not similar, then we can conclude that relevant correla- 
tions are present between the X variables. 

The independent model assumes that the X variables 
act independently, so we can form the probability for the 
X states conditioned upon the Y variable state using a 
simple product: 



Ptnd(x\y) = Y[p(xi\y) 



(19) 



Then, the conditional probability of the Y variable on 
the X variables can be found using Bayes' theorem. 



Pind(y\x) 



Pznd{x\y)p(y) 

Ptnd(x) 



(20) 



The independent joint distribution of the X variables 
is given by: 



Pind(x) = ^2 Ptnd(x\y)p(y) 



(21) 



Then, AI is given by the weighted Kullback-Leibler dis- 
tance between the conditional probability of the Y vari- 
able on the X variables for the independent model and 
the actual conditional probability of the same type. 



AI(S; Y) ee P(x) P(VW lo S 



ses 



p(y\i 



Pind{y\x) 



(22) 



About AI, Nirenberg and Latham say [371 ] . 
"Specifically, AI is the cost in yes/no questions 
for not knowing about correlations: if one were guessing 
the value of the Y variable based on the X variables, 
x, then it would take, on average, AI more questions 
to guess the value of Y if one knew nothing about the 
correlations than if one knew everything about them 
[Variable names changed to match this work]." 



F. Redundancy-synergy index 

Another multivariate information measure was intro- 
duced by Chechik et al. Q. This measure was originally 
referred to as the redundancy-synergy index (RSI) and it 
was created as an extension of the interaction informa- 
tion. It is given by: 



RSI(S;Y) = I(S;Y) - £ I(X i; Y) 



(23) 



XiES 



The redundancy-synergy index is designed to be max- 
imal and positive when the variables in S are purported 
to provide synergistic information about Y. It should 
be negative when the variables in 5* provide redundant 
information about Y. When S contains two variables, 
the redundancy-synergy index is equal to the interaction 
information. The negative of the redundancy-synergy in- 
dex has also been referred to as the redundancy fllT |. 



G. Varadan's synergy 

Yet another multivariate information measure was in- 
troduced by Varadan et al. [12]. In the original work, 
this measure is referred to as the synergy, but to avoid 
confusing it with other measures, we will refer to this 
measure as Varadan's synergy (VS). It is given by: 

VS{S; Y) = I(S; Y) - max ^ I{S 3 ; Y) (24) 



In Eq. ([24]) . Sj refers to the possible sub-sets of S. So, for 
instance, if S = {Xi,X 2 ,X 3 }, Varadan's synergy would 
be given by: 



VS(S;Y) = I(S;Y) 

I(X 1 ;Y) + I(X 2 ,X 3 ;Y) 
I(X i; Y) + I(X 2 ,X 3 ;Y) 
I(X i; Y) + I(X 2 ,X 3 ;Y) 
I{Xx;Y) + I{X 2 ;Y) + I(X 3 ;Y) 



-max < 



(25) 



Similar to the interaction information, when Varadan's 
synergy is positive, the variables in S are said to provide 
synergistic information about Y, while when Varadan's 
synergy is negative, the variables in S are said to provide 
redundant information about Y. Note that, when S = 
{Xi,X 2 }, Varadan's synergy is equal to the interaction 
information. 
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H. Partial information decomposition 

Finally, we will examine the collection of information 
values introduced by Williams and Beer in the partial 
information decomposition (PID) [l3|]. (For three other 
applications of the partial information decomposition, see 
recent works by James et al. [Hj], Flecker et al. [HI], and 
Griffith and Koch [111). The partial information decom- 
position is a method of dissecting the mutual informa- 
tion between a set of variables S and one other variable 

Y into non-overlapping terms. These terms quantify the 
information provided by the set of variables in S about 

Y uniquely, redundantly, syncrgistically, and in mixed 
forms. The partial information decomposition has sev- 
eral potential advantages over other measures. First, it 
produces only non-negative results, unlike the interaction 
information. Second, it allows for the possibility of syner- 
gistic and redundant interactions simultaneously, unlike 
the interaction information and Al. 

For the sake of brevity, we will not describe the entire 
partial information decomposition here, but we will de- 
scribe the case where S = {X\, X 2 }. A description of the 
general case can be found in Williams and Beer's original 
work 13[ . The relevant mutual informations are equal to 
sums of the partial information terms. For the case of 
two X variables, there are only four possible terms. In- 
formation about Y can be provided uniquely by each X 
variable, redundantly by both X variables, or syncrgis- 
tically by both X variables together. Written out, the 
relevant mutual informations are given by the following 
sums: 

I(X 1 ,X 2 ;Y) = Synergy{X ll X 2 ) + Unique{X 1 ) + 

Unique(X 2 ) + Redundancy (Xi, X 2 ) (26) 



I(X±; Y) = Unique{X 1 ) + Redundancy (X 1 ,X 2 ) (27) 



I(X 2 ;Y) = Unique(X 2 ) + Redundancy {X \, X 2 ) (28) 

The relevant mutual information values can be calcu- 
lated easily. As described by Williams and Beer, the 
redundancy term is equal to a new information expres- 
sion: the minimum information function. This function 
attempts to capture the intuitive view that the redun- 
dant information for a given state of Y is the informa- 
tion that is contributed by both X variables about that 
state of Y (consult Williams and Beer's original work 
[l3j for details and further motivation). The minimum 
information function is related to the specific information 
flol ] fill ] . The specific information is given by: 



I spec (y;X) = ]T p(x y) 



log 



1 



p(y) 



— log 



1 



(29) 



In Eq. ([29]) . the specific information quantifies the 
amount of information provided by X about a specific 
state of the Y variable. 

The minimum information can then be calculated by 
comparing the amount of information provided by the 
different X variables for each state of the Y variable con- 
sidered individually. 

I m in(Y;Xi,X 2 ) = p(y)minxJs P ec{y, Xj) (30) 

The minimum in Eq. (|30|) is taken over each X vari- 
able considered separately. Once the redundancy term is 
calculated via the minimum information function, the re- 
maining partial information terms can be calculated with 
ease. 

It should also be noted that the partial information de- 
composition provides an explanation for negative inter- 
action information values. To see this, insert the partial 
information expansions in Eq. (|26"1l . and ([28]) to the 
mutual information terms in the interaction information: 

II{X 1] X 2 ;Y) = I{X u X 2 ;Y)-I{X 1 ,Y)-I{X 2t Y) = 
Synergy {Xi , X 2 ) — Redundancy (Xi , X 2 ) (31) 

Thus, the partial information decomposition finds that a 
negative interaction information value implies that the 
redundant contribution is greater than the synergistic 
contribution. Furthermore, the structure of the partial 
information decomposition implies that synergistic and 
redundant interactions are not mutually exclusive, as was 
the case for the traditional interpretation of the interac- 
tion information. Thus, according to the partial infor- 
mation decomposition, there may be non-zero synergistic 
and redundant contributions simultaneously. 

Throughout the remainder of this article, we will label 
the various terms in the partial information decomposi- 
tion in accordance with the notation used by Williams 
and Beer. The term that has been interpreted as the 
synergy will be referred to as Hr(Y; {12}) or PID syn- 
ergy. The term that has been interpreted as the redun- 
dancy will be labeled as n#(y;{l}{2}) or PID redun- 
dancy. The unique information terms will be referred to 
as n#(Y; {1}) and n^(Y; {2}), or simply as PID unique 
information. 

When the partial information decomposition is ex- 
tended to the case where S = {Xi,X 2 ,X 3 }, new mixed 
terms are introduced to the expansions of the mutual in- 
formations. For instance, information can be supplied 
about Y redundantly between X3 and the synergistic 
contribution from X\ and X 2 (this term is noted as 
Hr(Y; {12}{3})). In total, the partial information de- 
composition contains 18 terms when S contains three 
variables. It can be shown that the interaction infor- 
mation between Y and the X variables contained in S is 
related to the partial information terms by the following 
equation [l3j : 

II(Y;X 1 ;X 2 ;X 3 ) = IL R (Y;{123}) + 
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U R (Y;{1}{2}{3})-U R (Y;{1}{23})- 
n R (Y;{2}{!3}) -n fi (F;{3}{12}) - 
Il B (Y; {12}{13}) - U R (Y; {12}{23}) - 
U R (Y; {13}{23}) - 2U R (Y; {12}{13}{23}) (32) 

From Eq. (|32p . we can see that the four- way interaction 
information is related to the partial information decom- 
position via a complicated summation of terms. 

IV. EXAMPLE SYSTEMS 

We will now apply the multivariate information mea- 
sures discussed above to several simple systems in an at- 
tempt to understand their similarities, differences, and 
uses. These systems have been chosen to maximize the 
contrast between the information measures, but many 
other systems exist for which the information measures 
produce identical results. 

A. Examples 1-3: two-input Boolean logic gates 

The first set of examples we will consider are simple 
Boolean logic gates. These logic gates are well known 
across many disciplines and offer a great deal of simplic- 
ity. The results presented in Table U highlight some of 
the commonalities and disparities between the various 
information measures. It should be noted that, due to 
the simple structure of the Boolean logic gates, the to- 
tal correlation is equal to the mutual information. Also, 
due to the fact that only two-input Boolean logic gates 
are being considered, the redundancy-synergy index and 
Varadan's synergy are directly related to the interaction 
information. Additional examples will highlight differ- 
ences between these information measures. 

All information measures provide a similar result for 
the XOR-gate (with the exception of the dual total cor- 
relation, see below). The interaction information, AI, 
the redundancy-synergy index, Varadan's synergy, and 
the partial information decomposition all indicate that 
the entire bit of information between Y and {Ai, X 2 } is 
accounted for by synergy. We might expect this result 
because, to know the state of Y for an XOR-gatc, the 
state of both X\ and X 2 must be known. 

The results for X\ gate demonstrate the potential util- 
ity of the partial information decomposition. The unique 
information term from Xi is equal to one bit, thus indi- 
cating that the X\ variable entirely and solely determines 
the state of the output variable. This result is confirmed 
by the truth-table. This result can also be seen by con- 
sidering the values of the other measures together (for 
instance, the three mutual information measures), but 
the partial information decomposition provides these re- 
sults more succinctly. 

More significant differences among the information 
measures appear when considering the AND-gatc. The 
partial information decomposition produces the result 



TABLE I. Examples 1 to 3: two-input Boolean logic gates. 
XOR-gate: All information measures produce consistent re- 
sults. Ai-gate: The partial information decomposition suc- 
cinctly identifies a relationship between Xi and Y. AND- 
gate: The partial information decomposition identifies both 
synergistic and redundant interactions. The interaction infor- 
mation finds only a synergistic interaction. AI identifies the 
importance of correlations between X\ and X 2 . 
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0.311 
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1 
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n«(y;{l2}) 


1 





0.5 



that 0.311 bits of information are provided redundantly 
and 0.5 bits are provided synergistically. Since each X 
variable provides the same amount of information about 
each state of Y (see Eq. (|30p ). the partial information de- 
composition finds that all of the mutual information be- 
tween each X variable individually and the Y variable is 
redundant. As a result of this, no information is provided 
uniquely, and subsequently, the entirety of the remaining 
0.5 bits of information between Y and {Xi,X 2 } must 
be synergistic. From this, we can see in action the fact 
that the partial information decomposition emphasizes 
the amount of information that each X variable provides 
about each state of Y considered individually. 

The interaction information, and by extension the 
redundancy-synergy index and Varadan's synergy, are 
limited to returning only a synergy value of 0.189 bits 
for the AND-gate. This value is produced because the 
mutual information between Y and {Xi,A2} contains 
an excess of 0.189 bits beyond the sum of the mutual 
informations between each X variable individually and 
the Y variable. So, here we can see in action the inter- 
pretation of the interaction information as the amount of 
information provided by the X variables taken together 
about Y, beyond what they provide individually. Also, 
the AND-gate allows us to see the relationship between 
the interaction information and the partial information 
decomposition as expressed by Eq. (jodj) . 

The value of AI for the AND-gate can be elucidated 
by examining the values of the conditional probability 
distributions that are relevant to the calculation of AI 
fTable HTj) . From these results, it is clear that if we use the 
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TABLE II. Values of conditional probabilities used to calcu- 
late AI for the AND-gate. 



y 




x 2 


Pind(y X 1 ,X 2 ) 


p(y 


Xi,X 2 ) 











1 




1 





1 





1 




1 
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1 




1 
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1 


0.25 







1 
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1 













1 





1 










1 


1 


1 


0.75 




1 



independent model, and we are presented with the state 
x\ = 1 and X2 = 1, we would conclude that there is a one- 
quarter chance that y = and a three-quarters chance 
that y = 1. If we use the actual data, then we know 
that, for that specific state, y must equal 1. This example 
points to a subtle, but critical difference between AI and 
the other multivariate information measures. Namely, 
the other information measures are concerned with dis- 
cerning the interactions among the variables in the sit- 
uation where you know the values of all the variables 
simultaneously, whereas AI is concerned with compar- 
ing that situation to the independent model described 
by Eq. UU) (see Section ITVBl for further discussion of 
this topic). 

The values of the dual total correlation for the XOR- 
gate example in Table [J demonstrate a crucial difference 
between the dual total correlation and the other multi- 
variate information measures. Namely, the dual total cor- 
relation does not differentiate between the X and Y vari- 
ables. So, dependencies between all variables are treated 
equally In the case of the XOR-gate, the entropy of any 
variable conditioned on the other two is zero. However, 
the joint entropy between all variables is 2 bits, so the 
dual total correlation is equal to 2. Clearly, this result 
is greater than I{X\, X 2 ;Y) for this example. So, if we 
assume the synergy and redundancy arc some portion 
of I(Xi, X 2 ; Y), the dual total correlation cannot be the 
synergy or the redundancy. However, this result is not 
surprising given the fact that, if we assume the synergy 
and redundancy are some portion of I(Xi 7 X 2 ;Y), the 
synergy and redundancy require some differentiation be- 
tween the X variables and Y variables. Since the dual 
total correlation does not incorporate this distinction, we 
should expect that it measures a fundamentally different 
quantity (see Section [IV Dl for further discussion of this 
topic). 



TABLE III. Example 4. For this system, A7 is greater than 
7(Ai, X 2 ;Y). Schneidman et. al. also present an example 
that demonstrates that A7 is not bound by I{Xi, X 2 \ Y) [jjj. 
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n fl (V;{l}) 
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the Y variable. Why, in this case, AI is greater than 
I(Xi,X2',Y) is not immediately clear. To better under- 
stand this result, we can examine the difference between 
AI and I(Xi,X 2 ;Y). Using Eq. dHJ , and ©, this 
difference can be expressed as: 



I(S;Y) - AI(S;Y) 



E 

x£S,y£Y 



p(y, x) log 



Pind(x\y) 

Pind(x) 



(33) 

The quantity expressed on the RHS of Eq. (|33|) . though 
similar in form, is not a mutual information. Based on 
the example in Table Hill and the examples in TablelH this 
quantity can be positive or negative. Schneidman et. al. 
further explore this and other noteworthy features of AI 
[lj|. Fundamentally, AI is a comparison between the 
complete data and an independent model (as expressed 
in Eq. (fl~9|'). As Schneidman et. al. note, alternative 
models could be chosen for the purpose of measuring the 
importance of correlations between the X variables in 
the data. We wish to emphasize that AI can provide 
useful information about a system, but that it measures 
a fundamentally different quantity in comparison to the 
other multivariate information measures. 



C. Example 5 



B. Example 4 

Another relevant example for AI is shown in Table Hill 
The crucial point to draw from this example is that AI 
can be greater than I(Xi,X 2 \Y). This appears to be 
in conflict with the intuitive notion of synergy as some 
part of the information the X variables provide about 



The example shown in Table IIVI highlights some in- 
teresting differences between the information measures, 
especially regarding the partial information decomposi- 
tion. Results from the partial information decomposition 
indicate that 1 bit of information about Y is provided re- 
dundantly by X\ and X 2 , while 1 bit is provided syncrgis- 
tically. This situation is similar to the AND-gate above. 
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TABLE IV. Example 5: Y obtains a different state for each 
unique combination of Xi and X 2 . The partial informa- 
tion decomposition indicates the presence of redundancy be- 
cause the X variables provide the same amount of information 
about each state of Y, despite the fact that the X variables 
provide information about different states of Y. The inter- 
action information and AI provide null results. Griffith and 
Koch also discuss this example in relation to multivariate in- 
formation measures [26| . 
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Each X variable provides 1 bit of information about Y, 
but both X variables provide the same amount of infor- 
mation about each state of Y. So, the partial information 
decomposition concludes that all of the information is re- 
dundant. It should be noted that this is the case despite 
the fact that X\ and X2 provide information about dif- 
ferent states of Y. X\ can differentiate between y = 
and y = 2 on the one hand and y = 1 and y = 3 on 
the other, while X2 can differentiate between y = and 
y = 1 on the one hand and y — 2 and y — 3 on the other. 
Even though the X variables provide information about 
different states of Y , the partial information decomposi- 
tion is blind to this distinction and concludes, since the X 
variables provide the same amount of information about 
each state of Y, that their contributions are redundant. 
Because all of the mutual information between each X 
variable considered individually is taken up by redun- 
dant information, the partial information decomposition 
concludes there is no unique information and, thus, the 
remaining 1 bit of information must be synergistic. 

Example 5 demonstrates the conditions for null results 
from the interaction information and AI. When con- 
sidering the relationship between one of the X variables 
and Y, we see that knowing the state of the X variable 
reduces the uncertainty about Y by 1 bit in all cases. 
However, knowing both X variables only provides 2 bits 
of information about Y. So, the interaction information 
must be zero because no additional information about Y 
is gained or lost by knowing both X variables together 
compared to knowing them each individually. Similarly, 



TABLE V. Example 6. All information measures, with the ex- 
ceptions of the total correlation and the dual total correlation, 
are zero. The total correlation and the dual total correlation 
produce non-zero results because they detect interactions be- 
tween the X variables. 
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AI must be zero because the knowledge of the state of X\ 
and X2 simultaneously does not provide any additional 
knowledge about Y compared to the independent models 
for the relationships between each X variable and Y. 



D. Example 6 

The example shown in Table [V] demonstrates a sig- 
nificant feature of the total correlation. Even when no 
information is passing to one of the variables considered, 
the total correlation and the dual total correlation can 
still produce non-zero results if interactions are present 
between other variables in the system. This result can be 
clearly understood using the expression for the total cor- 
relation in Eq. (|15p and the expression for the dual total 
correlation in Eq. (fl"8|) . The total correlation sums the 
information passing between variables from the smallest 
scale (two variables) to the largest scale (n variables). 
It will detect relationships at all levels and it is unable 
to differentiate between those levels. The dual total cor- 
relation compares the total correlation to the amount of 
information passing between each individual variable and 
all other variables considered together as a single vector 
valued variable. As with the dual total correlation, the 
total correlation does not differentiate between the X 
and Y variables, unlike several of the other information 
measures. 

In this case, Y has no entropy, so all information terms 
that depend on the entropy of Y (i.e., all of the other in- 
formation measures considered here) are zero. This is 
expected since all of the other information measures are 
cither explicitly focused on the relationship between the 
X variables and the Y variable or only focus on interac- 
tions that involve all variables. 
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E. Examples 7 and 8: three-input Boolean logic 

gates 

The three-input Boolean logic gate examples shown in 
Table I VII allow for a comparison between the interaction 
information, the redundancy-synergy index, Varadan's 
synergy, and the partial information decomposition. The 
three-way XOR gate produces similar results to the 
XOR-gate shown in Table HI All of the information mea- 
sures indicate the presence of a synergistic interaction. 
The partial information decomposition is able to localize 
the synergy to an interaction between all three X vari- 
ables. 

Significant differences appear between the information 
measures when an extraneous X3 variable is added to a 
basic XOR-gate between Xi and X2 . In this case, the in- 
teraction information is zero because there is no synergy 
present between all three X variables. This is despite the 
fact that the interaction information indicated synergy 
was present for the basic XOR-gate. Thus, we can see 
that the interaction information focuses only on interac- 
tions between all of the X variables and the Y variable. 
A similar result is observed with Varadan's synergy. De- 
spite the fact that it indicated the presence of synergy 
in the basic XOR gate, Varadan's synergy does not indi- 
cate synergy is present in this logic gate because it also 
focuses only on interactions between all of the X vari- 
ables and the Y variable. Both the redundancy-synergy 
index and the partial information decomposition return 
results that indicate the presence of synergy between the 
X variables and the Y variable, but only the partial in- 
formation decomposition is able to localize the synergy 
to the X\ and X 2 variables. 



F. Examples 9 to 13: simple model networks 

In an effort to discuss results more directly applicable 
to several research topics, we will now apply the multi- 
variate information measures to several variations of a 
simple model network. The general structure of the net- 
work is shown in Fig. Q] The network contains three 
nodes, each of which can be in one of two states (0 or 
1) at any given point in time. The default state of each 
node is 0. At each time step, there is a certain probabil- 
ity, call it p r , that a given node will be in state 1. The 
probability that a given node is in state 1 can also be 
increased if it receives a connection from another node. 
This driving effect is noted by p\ y for the connection from 
X\ to Y, pi2 for the connection from X\ to X 2 , and p 2y 
for the connection from X 2 to Y , All states of the net- 
work are determined simultaneously and are independent 
of the previous states of the network. (See Appendix [C] 
for further details regarding this model.) 

For this simple system, we will discuss five combina- 
tions of pr, ply, pl2, and p2y that correspond to note- 
worthy network topologies. The information theoretic 
results for these examples are presented in Table I VIII 



TABLE VI. Examples 7 and 8: three-input Boolean logic 
gates. All partial information decomposition terms not shown 
in the table are zero. 3XOR: Three-way XOR-gate. All in- 
formation measures produce consistent results. V1X2XOR: 
XOR-gate involving only X\ and X 2 - The redundancy- 
synergy index identifies a synergistic interaction and AI iden- 
tifies the importance of correlations between the X variables. 
The partial information decomposition also identifies the vari- 
ables involved in the synergistic interaction. The interaction 
information and Varadan's synergy do not identify a syner- 
gistic interaction. 





3XOR 


V1V2XOR 


p(x 1 ,x 2 ,y) 


Xl X 2 2?3 


y 


y 


1/8 











1/8 


1 


1 


1 


1/8 


1 


1 


1 


1/8 


1 1 


U 


U 


1/8 


1 


1 





1/8 


1 1 





1 


1/8 


1 1 





1 


1/8 


1 1 1 


1 





I{X V ,Y) 








I{Xr,Y) 








I(X 3 - Y) 








I(Xi,Xr,Y) 


1 


1 


II(Xi;X 2 ;Y) 


1 





TC(Xv,X 2 ;Y) 


1 


1 


DTC{X 1 ;X 2 ;Y) 


3 


2 


AI(Xx,X 2 ;Y) 


1 


1 


RSI(Xx,Xr,Y) 


1 


1 


VS(Xi,Xr,Y) 


1 





n«(V;{l2}) 





1 


n fl (Y;{123}) 


1 







FIG. 1. Structure of model network used for Examples 9 to 
13. 



Example 9 represents a system where the X nodes in- 
dependently drive the Y node. Similarly to Example 5, 
the partial information decomposition indicates that the 
information from X\ and X 2 is entirely redundant and 
synergistic. This result is somewhat counter intuitive be- 
cause the X nodes act independently. Again, this is due 
to the structure of the minimum information in Eq. (|30|) . 
Each X variable provides the same information about 
each state of V, so the partial information decomposition 
returns the result that all of the information provided by 
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TABLE VII. Examples 9 to 13: simple model network. All 
information values are in millibits. 
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each X variable about Y is redundant. The interaction 
information returns a result that indicates the presence of 
synergy, though the magnitude of this interaction is less 
than the magnitudes of the synergy and redundancy re- 
sults from the partial information decomposition. Note 
that this is the only network for which the interaction 
information indicates the presence of synergy. 

Example 10 is similar to Example 9 with the exception 
that X\ now also drives X2. Several interesting results 
are produced for this example. For instance, the total 
correlation and dual total correlation are significantly el- 
evated in comparison to the other examples. In this ex- 
ample, there is the maximum amount of interactions be- 
tween all nodes. So, this result agrees with expectations 
because the total correlation and dual total correlation 
reflect the total amount of interactions at all scales be- 
tween all variables. Also, AI obtains its highest value 
for this example because the actual data and the inde- 
pendent model from Eq. (|19p arc more dissimilar due to 
the interactions between Xi and X-i- Interestingly, the 
partial information decomposition does not indicate the 
presence of unique information from Xi , despite the fact 
that X\ is directly influencing Y. As with example 9 
above, the partial information decomposition returns the 
result that all of the information X\ provides about Y 
is redundant. In this case, this result is more intuitive 
because X\ also drives X^. The interaction information 
returns a significantly larger magnitude result for this ex- 
ample. This is intuitive given the fact that X\ is driving 
X2 and that both X variables are driving the Y variable. 
However, it should be noted that the magnitude of the 
interaction information is significantly less than the mag- 
nitude of the synergy and redundancy from the partial 
information decomposition. Also, the interaction infor- 
mation result implies the presence of redundancy, unlike 



Example 10. 

Example 11 represents a common problem case when 
attempting to infer connectivity based solely on node ac- 
tivity. Node X\ drives X%, which in turn drives Y. If 
the activity of X2 is not known, it would appear that X\ 
is driving Y directly. The partial information decompo- 
sition returns the result that any information provided 
by X\ about Y is redundant and that the vast majority 
of the information provided by X\ and X2 about Y is 
unique information from AV Both of these results ap- 
pear to accurately reflect the structure of the network. 

Example 12 also represents a common problem case 
when determining connectivity Node X\ drives Y and 
X2 ■ If the activity of X\ is not known, it would appear 
that X2 is driving Y , when, in fact, no connection ex- 
ists from X2 to Y. Similarly to Example 11, the partial 
information decomposition identifies the majority of the 
information from X\ and X2 about Y as unique informa- 
tion from X\ and the remaining information as redun- 
dant. Again, these results appear to accurately reflect 
the structure of the network. 

The final example is similar to Example 6 above. In 
this case, no connections exist from X\ or X2 to Y, but 
Xi drives AV Almost all of information measures in- 
dicate a lack of information transmission. However, the 
total correlation and the dual total correlation pick up 
the interaction between Xi and X2- The values of the 
total correlation and the dual total correlation vary ap- 
proximately linearly with the number of connections in 
each network example. This, again, demonstrates the 
fact that the total correlation and the dual total corre- 
lation measure interactions between all variables at all 
scales. 



G. Analysis of dissociated neural culture 

We will now present the results of applying the infor- 
mation measures discussed above to spiking data from a 
dissociated neural culture as an illustration of the type 
of analysis that is possible using these information mea- 
sures. 

The data we chose to analyze are described in Wage- 
naar et. al. and are freely available online [42[ . The 
data contain multiunit spiking activity for each of 60 
electrodes in the multielectrode array on which the dis- 
sociated neural culture was grown. Specifically, we used 
data from neural culture 2-2. All details regarding the 
production and maintenance of the culture can be found 
in (42| . We analyzed recordings from eight points in the 
development in the culture: days in vitro (DIV) 4, 7, 12, 
16, 20, 25, 31, and 33. The DIV 16 recording was 60 
minutes long, while all others were 45 minutes long. 

For this analysis, the data were binned at 16 ms. The 
probability distributions necessary for the computation 
of the information measures were created by examining 
the spike trains for groups of three non-identical elec- 
trodes. For a given group of electrodes, one electrode 



11 



was labeled the Y electrode, while the other two were 
labeled the Xi and X 2 electrodes. Then, for all time 
steps in the spike trains, the states of the electrodes 
(spiking or not spiking) were recorded at time t for the 
X\ and X2 electrodes and at time t + 1 for the Y elec- 
trode. Next, by counting how many times each state ap- 
peared throughout the spike train, the joint probabilities 
p(yt+i, x± t ti x 2,t) were calculated, which were then used 
to calculate the information measures discussed above. 
This process was repeated for each group of non-identical 
electrodes. However, to avoid double counting, groups 
with swapped X variable assignments were only analyzed 
once. For instance, the group X\ = electrode 3, X 2 = 
electrode 4, and Y = electrode 5 was analyzed, but the 
group X\ = electrode 4, X2 = electrode 3, and Y = elec- 
trode 5 was not analyzed. In order to compensate for the 
changing firing rate through development of the cultures, 
all information values for a given group were normalized 
by the entropy of the Y electrode. 

To illustrate the statistical significance of the infor- 
mation measure values, we also created and analyzed 
a randomized data set from the original neural culture 
data. The randomization was accomplished by splitting 
each electrode spike train at a randomly chosen point 
and swapping the two remaining pieces. By doing this, 
the structure of the electrode spike train is almost en- 
tirely preserved, but the temporal relationship between 
the electrode spike trains is significantly disrupted. 

The results of these analyses are presented in Fig. [2] 
and Fig. [3] The results shown in Fig. [2] indicate that at 
day 4 essentially no information was being transmitted 
in the network. However, by day 7, a great deal of in- 
formation was being transmitted, as can be seen by the 
peaks in mutual information (Fig. [2]E) and the total cor- 
relation (Fig. H]F). As the culture continued to develop 
after day 7, most information measures decreased and 
then slowly increased to maxima on the last day, DIV 
33. Interestingly, AI (Fig. [2] G) showed an increase at 
day 7, but then a steady increase afterwards. The total 
correlation (Fig. [2]F) mimics the changes in the mutual 
information (Fig. [2]E), but because the total correlation 
measures the total amount of information being trans- 
mitted among the X and Y variables, it possessed higher 
values than the mutual information. 

The relationship between the interaction information 
and the partial information decomposition was also illus- 
trated through development. As the culture developed, 
the PID synergy was larger than the PID redundancy. 
Then, between days 31 and 33, the PID synergy became 
significantly smaller than the PID redundancy. In the 
interaction information, this relationship was expressed 
by positive values through most of the culture's devel- 
opment, with the exception of large negative values at 
days 7 and 33. However, notice that groups of electrodes 
with positive and negative interaction information values 
were found in each recording. To further investigate this 
relationship, we plotted the distribution of PID synergy 
and PID redundancy for groups of electrodes (Fig. [3]). 



This plot shows that the network contained groups of 
electrodes with slightly more PID redundancy than PID 
synergy at day 7, but that, after that point, the total 
amount of information decreased and became more bi- 
ased towards PID synergy at day 12. From that point, 
the total amount of information increased up to the last 
recording where the network was once again biased to- 
wards PID redundancy. So, we can relate the results 
from the partial information decomposition and the in- 
teraction information using Fig. [3] by noting that, while 
the PID synergy and PID redundancy for a given group of 
electrodes determines a points position in Fig. [3J the in- 
teraction information describes how far that point is from 
the equilibrium line. Given the fact that many points in 
Fig. [3] are near the equilibrium line, the partial informa- 
tion decomposition finds that many groups of electrodes 
contain synergistic and redundant interactions simulta- 
neously. This feature would be lost by only examining 
the interaction information. 

Obviously, this analysis could be made significantly 
more complex and interesting. For instance, the analysis 
could be improved by including more data sets, varying 
the variable assignments, using different bin sizes, using 
more robust methods to test statistical significance, and 
so forth. However, based on this simple illustration, we 
believe that it is clear that the information analysis meth- 
ods discussed herein could be used to address interesting 
questions related to this system, or other systems. For in- 
stance, it may be possible to relate these changes through 
development to previous work on changes in dissociated 
cultures through development (42Tj46j . 



V. DISCUSSION 

Based on the results from several simple systems, we 
were able to explore the properties of the multivariate 
information measures discussed in this paper. We will 
now discuss each measure in turn. 

The oldest multivariate information measure - the in- 
teraction information - was shown to focus on interac- 
tions between all X variables and the Y variable using 
the three-input Boolean logic gate examples. Further- 
more, the two-input AND-gate demonstrated how the 
interaction information is related to the excess informa- 
tion provided by both X variables about the Y variable 
beyond the total amount of information those X variables 
provide about Y when considered individually. Also, that 
example demonstrated the relationship between the in- 
teraction information and the partial information decom- 
position as shown in Eq. (|3ip . For the model network ex- 
amples, the interaction information had its largest mag- 
nitude when the interactions were present between all 
three nodes. Also, for these examples, the interaction in- 
formation indicated the presence of synergy when both 
nodes X\ and X 2 drove Y, but not each other (Example 
9), while it indicated the presence of redundancy when 
either node X\ or X 2 drove Y and X\ drove Y (Exam- 
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FIG. 2. Many information measures show changes over neural development. A) PID Synergy. B) PID Redundancy. C) Positive 
Interaction Information values. D) Negative Interaction Information values. E) Mutual Information. F) Total Correlation. G) 
AI. H) PID Unique Information. All information values are normalized by the entropy of the Y electrode. Each individual 
data point represents one group of electrodes. To improve clarity, the data points are jittered randomly around the DIV and 
only 0.4% of the data points are shown. The line plots show the 90th percentile, median, and 10th percentile of all the data 
for a given DIV. Note that as the culture matured, the total amount of information transmitted increased and the types of 
interactions present in the network changed. 
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PID Synergy vs. PID Redundancy 
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FIG. 3. The balance of PID synergy and PID redundancy 
changed during development. Distribution of normalized PID 
synergy and PID redundancy. Each data point represents the 
information values for one group of electrodes (only 2% of 
the data are shown to improve clarity). Diamonds represent 
mean values for a given DIV. 



pies 10 to 12). When the interaction information was ap- 
plied to data from a developing neural culture, it showed 
changes in the type of interactions present in the network 
during development. 

In contrast to the interaction information, the total 
correlation was shown to sum interactions among all vari- 
ables at all scales using Example 6. In other words, 
the value of the total correlation for any system incor- 
porates interactions between groups of variables at all 
scales. This feature was made apparent using the model 
network examples. There, the total correlation varied 
approximately linearly with the number of connections 
present in the network. Furthermore, the total corre- 
lation is symmetric with regard to all variables consid- 
ered, whereas the other information measures focus on 
the relationship between the set of X variables and the 
Y variable. When applied to the data from the neural 
culture, the total correlation and the mutual information 
both showed increases in the total amount of information 
being transmitted in the network through development. 

The dual total correlation was found to be similar to 
the total correlation in that both do not differentiate be- 
tween the X and Y variables. Also, like the total cor- 
relation, the dual total correlation varied approximately 
linearly with the number of connections in the model 
network examples. The function of the dual total corre- 
lation was also highlighted with the XOR-gate example. 
There, we saw that the dual total correlation compares 
the uncertainty with regards to all variables with the to- 



tal uncertainty that remains about each variable if all the 
other variables are known. 

Using the AND-gate, AI was shown to measure a sub- 
tly different quantity compared to the other information 
measures. The other information measures seek to eval- 
uate the interactions between the X variables and the Y 
variable given that one knows the values of all variables 
simultaneously (i.e. in the case that the total joint prob- 
ability distribution is known). AI compares that situa- 
tion to a model where it is assumed that the X variables 
act independently of one another in an effort to measure 
the importance of knowing the correlations between the 
X variables. Clearly, this goal is similar to the goals of 
the other information measures. However, given the fact 
that AI can be greater than I(X\, X2; Y), as was shown 
in Example 4, and if we assume the synergy and redun- 
dancy are some portion of I(Xi 1 X 2 ]Y) 1 AI cannot be 
the synergy or the redundancy. AI can provide useful 
information about a system, but the distinction between 
the structure of AI and the other information measures, 
along with the fact that AI cannot be the synergy or 
redundancy as previously defined, should be considered 
when choosing the appropriate information measure with 
which to perform an analysis. Unlike several of the other 
information measures which showed changes in the types 
of interactions present in the developing neural culture, 
AI showed a uniform increase in the importance of cor- 
relations in the network throughout development. 

The redundancy-synergy index and Varadan's synergy 
are identical to the interaction information when only two 
X variables are considered. However, when we examined 
three-input Boolean logic gates, we found that Varadan's 
synergy - like the interaction information - was unable to 
detect a synergistic interaction among a subset of the X 
variables and the Y variable. The redundancy-synergy 
index was able to detect this synergy, but it was unable 
to localize the subset of X variables involved in the in- 
teraction. 

The partial information decomposition provided inter- 
esting and possibly useful results for several of the exam- 
ple systems. When applied to the Boolean logic gates, 
the partial information decomposition was able to iden- 
tify the X variables involved in the interactions, unlike all 
other information measures. Using the AND-gate exam- 
ple, we saw that the partial information decomposition 
found that both synergy and redundancy were present 
in the system, unlike the interaction information, which 
indicated only synergy was present. Perhaps the most 
illuminating example system for the partial information 
decomposition was Example 5. In that case, the par- 
tial information decomposition concluded that each X 
variable provided entirely redundant information because 
each X variable provided the same amount of informa- 
tion about each state of Y, even though each X variable 
provided information about different states of Y. This 
point highlights how the partial information decomposi- 
tion defines redundancy via Eq. (j3"0)) . It calculates the 
redundant contributions based only on the quantity of 
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information each X variable provides about each state 
of Y. In the developing neural culture, the partial infor- 
mation decomposition, similar to the interaction infor- 
mation, showed a changing balance between synergy and 
redundancy through development. However, unlike the 
interaction information, the partial information decom- 
position was able to separate simultaneous synergistic 
and redundant interactions. 



H(X 1 ) + H{X 2 ) + H(X 3 )-H(X 1 ,X 2 ,X 3 ) = 
H(X 1 )+H(X 2 ) - H(X 1 ,X 2 ) + H(X U X 2 ) + 
H{X 3 )-H(X U X 2 ,X 3 ) = 
I{X 1 ;X 2 )+I{X ll X 2 -X 3 ) (Al) 

A similar substitution can be peformed for n > 3. 



VI. CONCLUSION 

We applied several multivariate information measures 
to simple example systems in an attempt to explore the 
properties of the information measures. We found that 
the information measures produce similar or identical re- 
sults for some systems (e.g. XOR-gatc), but that the 
measures produce different results for other systems. In 
examining these results, we found several subtle differ- 
ences between the information measures that impacted 
the results. Based on the understanding gained from 
these simple systems, we were able to apply the infor- 
mation measures to spiking data from a neural culture 
through its development. Based on this illustrative anal- 
ysis, we saw interesting changes in the amount of infor- 
mation being transmitted and the interactions present in 
the network. 

We wish to emphasize that none of these information 
measures is the "right" measure. All of them produce 
results that can be used to learn something about the 
system being studied. We hope that this work will assist 
other researchers as they deliberate on the specific ques- 
tions they wish to answer about a given system so that 
they may use the multivariate information measures that 
best suit their goals. 
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Appendix A: Additional total correlation derivation 

Eq. (fT4l can be rewritten as Eq. (|15|) by adding and 
subtracting several joint entropy terms and then using 
Eq. ([2]). For instance, when n — 3, we have: 

TC(S)=( J2 H ( X i)) -H(S) = 
\XieS / 



Appendix B: Additional dual total correlation 
derivation 

Eq. (Tl6|) can be rewritten as Eq. (|18jl by substituting 
the expression for the total correlation in Eq. (fT4|) and 
then applying Eq. @. 

DTC(S) = ( H(S/X t )) - (n - l)H(S)) = 
\Xies / 

I J2 H(S/X t ) + H(Xi) ) - nH (S) - TC(S) = 

I IiS/X^Xi)) -TC(S) (Bl) 

\Xi£S J 



Appendix C: Model Network 

Given values for p r , p± y , pi 2 , and p 2y , the joint prob- 
abilities for all possible states of the network can be cal- 
culated. For example: 



P ( Xl = 1) = Pr (CI) 



p( Xl =0) = l- Pr (C2) 



p(x 2 = l\xi = 1) = p r + pi 2 - p r p\ 2 (C3) 



p(x 2 = 0\ Xl = 1) = 1 - p(x 2 = l|a:i = 1) (C4) 

The joint probabilities for the examples discussed in 
the main text of the article are shown in Table IVIIII 
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TABLE VIII. Joint probabilities for examples 9 to 13. 
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